25 research outputs found
PrismDB: Read-aware Log-structured Merge Trees for Heterogeneous Storage
In recent years, emerging hardware storage technologies have focused on
divergent goals: better performance or lower cost-per-bit of storage.
Correspondingly, data systems that employ these new technologies are optimized
either to be fast (but expensive) or cheap (but slow). We take a different
approach: by combining multiple tiers of fast and low-cost storage technologies
within the same system, we can achieve a Pareto-efficient balance between
performance and cost-per-bit.
This paper presents the design and implementation of PrismDB, a novel
log-structured merge tree based key-value store that exploits a full spectrum
of heterogeneous storage technologies (from 3D XPoint to QLC NAND). We
introduce the notion of "read-awareness" to log-structured merge trees, which
allows hot objects to be pinned to faster storage, achieving better tiering and
hot-cold separation of objects. Compared to the standard use of RocksDB on
flash in datacenters today, PrismDB's average throughput on heterogeneous
storage is 2.3 faster and its tail latency is more than an order of
magnitude better, using hardware than is half the cost
Cliffhanger: Scaling Performance Cliffs in Web Memory Caches
Web-scale applications are heavily reliant on memory cache systems such as Memcached to improve throughput and reduce user latency. Small performance improvements in these systems can result in large end-to-end gains. For example, a marginal increase in hit rate of 1% can reduce the application layer latency by over 35%. However, existing web cache resource allocation policies are workload oblivious and first-come-first-serve. By analyzing measurements from a widely used caching service, Memcachier, we demonstrate that existing cache allocation techniques leave significant room for improvement. We develop Cliffhanger, a lightweight iterative algorithm that runs on memory cache servers, which incrementally optimizes the resource allocations across and within applications based on dynamically changing workloads. It has been shown that cache allocation algorithms underperform when there are performance cliffs, in which minor changes in cache allocation cause large changes in the hit rate. We design a novel technique for dealing with performance cliffs incrementally and locally. We demonstrate that for the Memcachier applications, on average, Cliffhanger increases the overall hit rate 1.2%, reduces the total number of cache misses by 36.7% and achieves the same hit rate with 45% less memory capacity
Karma: Resource Allocation for Dynamic Demands
The classical max-min fairness algorithm for resource allocation provides
many desirable properties, e.g., Pareto efficiency, strategy-proofness and
fairness. This paper builds upon the observation that max-min fairness
guarantees these properties under a strong assumption -- user demands being
static over time -- and that, for the realistic case of dynamic user demands,
max-min fairness loses one or more of these properties.
We present Karma, a generalization of max-min fairness for dynamic user
demands. The key insight in Karma is to introduce "memory" into max-min
fairness -- when allocating resources, Karma takes users' past allocations into
account: in each quantum, users donate their unused resources and are assigned
credits when other users borrow these resources; Karma carefully orchestrates
exchange of credits across users (based on their instantaneous demands, donated
resources and borrowed resources), and performs prioritized resource allocation
based on users' credits. We prove theoretically that Karma guarantees Pareto
efficiency, online strategy-proofness, and optimal fairness for dynamic user
demands (without future knowledge of user demands). Empirical evaluations over
production workloads show that these properties translate well into practice:
Karma is able to reduce disparity in performance across users to a bare minimum
while maintaining Pareto-optimal system-wide performance.Comment: Accepted for publication in USENIX OSDI 202
Packing Privacy Budget Efficiently
Machine learning (ML) models can leak information about users, and
differential privacy (DP) provides a rigorous way to bound that leakage under a
given budget. This DP budget can be regarded as a new type of compute resource
in workloads of multiple ML models training on user data. Once it is used, the
DP budget is forever consumed. Therefore, it is crucial to allocate it most
efficiently to train as many models as possible. This paper presents the
scheduler for privacy that optimizes for efficiency. We formulate privacy
scheduling as a new type of multidimensional knapsack problem, called privacy
knapsack, which maximizes DP budget efficiency. We show that privacy knapsack
is NP-hard, hence practical algorithms are necessarily approximate. We develop
an approximation algorithm for privacy knapsack, DPK, and evaluate it on
microbenchmarks and on a new, synthetic private-ML workload we developed from
the Alibaba ML cluster trace. We show that DPK: (1) often approaches the
efficiency-optimal schedule, (2) consistently schedules more tasks compared to
a state-of-the-art privacy scheduling algorithm that focused on fairness
(1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some
level of fairness for efficiency. Therefore, using DPK, DP ML operators should
be able to train more models on the same amount of user data while offering the
same privacy guarantee to their users